Skip to content

raphaelsty/knowledge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

569 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge

An opinionated reading list of 450+ people who shape AI, science, and software today. A place to learn.

No ads, no algorithm, just what they read.


Demo


Knowledge is a personal library. A place to keep what you found interesting on GitHub, on X, in a blog, in a paper, and to read it when you have time. A talk that mattered last week is still worth watching this week. A paper from two months ago still teaches you what you came for. Build your own library, or sit in someone else's for an hour. Spend that hour with Andrej Karpathy's bookmarks and you learn what fifteen years of ML looks like.


Tap the heart on any card and the doc lands in your library, indexed and searchable. Every contributor has a personal page that reads like a curated bookshelf: their tweets, their stars, the papers they wrote, the videos they show up in. Browse it the way you'd browse a friend's bookmarks folder.


Libraries to visit

A few rooms worth walking into.

Geoffrey Hinton Yoshua Bengio Yann LeCun Andrej Karpathy Ian Goodfellow Ilya Sutskever

Demis Hassabis Dario Amodei Sam Altman Greg Brockman Wojciech Zaremba Mark Chen

Oriol Vinyals Noam Shazeer Jason Wei Jared Kaplan Sam Bowman Aravind Srinivas

Tri Dao Albert Gu Lucas Beyer Tim Dettmers Horace He Sander Dieleman

Omar Khattab Matei Zaharia François Chollet Douwe Kiela Edward Grefenstette Kyunghyun Cho

Clément Delangue Thomas Wolf Merve Noyan Philipp Schmid Stas Bekman Jay Alammar

Sara Hooker Anima Anandkumar Chelsea Finn Natasha Jaques Maithra Raghu Irina Rish

Danqi Chen Rachel Thomas Percy Liang Graham Neubig Chris Olah Sebastian Raschka

Pieter Abbeel Sergey Levine Simon Willison Max Halford Pieter Levels Lex Fridman

Guillaume Lample Thomas Scialom Gilles Louppe Tim Rocktäschel François Fleuret Arvind Narayanan

Iacopo Poli Antoine Chaffin Manuel Faysse Tony Wu Amélie Chatelain Arthur Mensch

…or wander through all 450+ libraries.


Search

Type a query. ColBERT searches the actual contents of every doc, not just titles, and ranks them by how well the words match. Search one library, several at once, or the whole shared corpus.


MCP

The API exposes an MCP server at /mcp with fifteen tools. Twelve are public, three require a bearer token you mint at /profile.

Search & discover

  • search: query a single library
  • search_across: query several libraries at once
  • search_personalities: find libraries by description
  • find_similar: docs related to one you've read
  • latest: most recent docs in a library
  • feed: chronological cross-library feed
  • intersect_documents: docs shared between libraries

Catalog

  • list_personalities: every library
  • list_sources: sources for a library
  • list_tags: tags for a library
  • get_personality: one library's metadata
  • get_document: one doc by URL

Authenticated

  • my_library: your saved docs
  • my_timeline: your activity feed
  • save_document: save a doc to your library
claude mcp add knowledge --transport http https://knowledge-web.org/mcp \
  --header "Authorization: Bearer kn_..."

How it works

The pipeline runs all day, walking through each personality's sources in a continuous loop: GitHub stars, X posts, Hacker News submissions, arXiv papers, Hugging Face likes, Reddit, Stack Overflow, Wikipedia, the rest of it. Each document gets cleaned, tagged, written to Postgres. A separate indexer daemon picks up new rows and embeds them with ColBERT, so search stays current without blocking the main pipeline. When you type a query, the API serves ranked results from a next-plaid PLAID index sitting on local disk. Your browser does a second pass with an unquantized ColBERT running in WASM to re-rank what landed. Soup to nuts the whole stack lives in this repo: sources/ is Python (fetchers and orchestrator), api/ is Rust (search, ingest, auth, MCP), web/ is plain HTML and JS.


Why it helps

Most platforms compete for your attention with infinite feeds, ads between every post, notifications you didn't ask for, recommendations from an algorithm that learned to manipulate you. Knowledge does the opposite: small, finite libraries you can return to. Use it to research a topic across experts. Search 450+ libraries at once for "speculative decoding" and you get curated context instead of random Google noise. Browse Karpathy's GitHub stars, Yann LeCun's papers, Geoffrey Hinton's interviews, all in one place. Stop doomscrolling X. The site compresses someone's year of tweets into a static page you can read once and close. Sign in to save what matters, search your own library, mint a token to wire the MCP server into Claude, Cursor, or any agent that speaks MCP.


Under the hood

Knowledge has always been a showcase for the information retrieval tools I'm building. It started four years ago on a cherche backend and now runs on next-plaid and pylate-rs, the same search stack behind ColGREP, the semantic code-search tool. The API is a single Rust binary, the pipeline is Python, the frontend is plain HTML and JS. Everything runs on a single Hetzner VPS.


The pipeline parses about a dozen sources: GitHub stars, X posts and likes, Hacker News submissions and comments, arXiv, Google Scholar, DBLP, Hugging Face likes, YouTube channels, Zotero libraries, Reddit, Stack Overflow, Wikipedia references, plus any blog you can point at via RSS or sitemap. As of today that's 450+ personal libraries, around 440,000 documents indexed.


So yes, when you type a query a quantized ColBERT runs on the server's CPU against a next-plaid index, and then on your phone an unquantized ColBERT in WASM re-ranks the results. The browser-side full-precision re-rank is, as far as I know, an original trick.


Cost and hosting

Free to use, free to read. The whole site runs on a single Hetzner CX33 in Helsinki: 4 vCPUs, 8 GB RAM, around $15 a month all in. No CDN, no managed Postgres, no Cloudflare proxy in front of the app. The 3.8 GB ColBERT index sits on local disk and the API serves it directly. To self-host you clone the repo, set five env vars, point a domain at the box, push to main. PolyForm Noncommercial 1.0.0 covers personal and educational use.


License

PolyForm Noncommercial 1.0.0. Free to use, modify, and self-host for non-commercial purposes. Get in touch for anything else.


Citation

@software{sourty2026knowledge,
  author  = {Sourty, Raphaël},
  title   = {Knowledge: a library for the internet},
  year    = {2026},
  url     = {https://github.com/raphaelsty/knowledge},
  license = {PolyForm-Noncommercial-1.0.0}
}

About

Online library for AI

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors